IEE E P ro of 1 MPI - FAUN : An MPI - Based Framework 2 for Alternating - Updating Nonnegative
نویسنده
چکیده
5 Abstract—Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factorsW andH, for the 6 given input matrix A, such thatA WH. NMF is a useful tool for many applications in different domains such as topic modeling in text 7 mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining 8 community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a 9 new, high-performance parallel computational framework for a broad class of NMF algorithms that iteratively solves alternating 10 non-negative least squares (NLS) subproblems forW andH. It maintains the data and factor matrices in memory (distributed across 11 processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild 12 assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, 13 Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare 14 different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We 15 demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance 16 improvements. The code and the datasets used for conducting the experiments are available online.
منابع مشابه
A Projected Alternating Least square Approach for Computation of Nonnegative Matrix Factorization
Nonnegative matrix factorization (NMF) is a common method in data mining that have been used in different applications as a dimension reduction, classification or clustering method. Methods in alternating least square (ALS) approach usually used to solve this non-convex minimization problem. At each step of ALS algorithms two convex least square problems should be solved, which causes high com...
متن کاملDSANLS: Accelerating Distributed Nonnegative Matrix Factorization via Sketching Technical Report
Nonnegative matrix factorization (NMF) has been successfully applied in di erent elds, such as text mining, image processing, and video analysis. NMF is the problem of determining two nonnegative low rank matrices U and V , for a given input matrix M , such that M ≈ UV >. There is an increasing interest in parallel and distributed NMF algorithms, due to the high cost of centralized NMF on large...
متن کاملMPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling
Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...
متن کاملDetermining Malmquist Productivity Index in DEA and DEA-R based on Value Efficiency
Malmquist Productivity Index (MPI) is a numeric index that is of great importance in measuring productivity and its changes. In recent years, tools like DEA have been utilized for determining MPI. In the present paper, some models are recommended for calculating MPI when there are just ratio data available. Then, using DEA and DEA-R, some models are proposed under the constant returns to scale ...
متن کاملDOPPLER-DERIVED RIGHT VEN T RICU L AR MYOC ARD IAL PERFORMANCE IN DEX IN NEON ATES: N ORMAL VALUES
Doppler-derived myocardial performance index (MPI) , defined as the s um of isovolumetric contraction and relaxation durations divided by ejection time, is an easily measured and reproducible index that shows both systolic and diastolic myocardial function. The goal of this study was to define normal values of right ventricular MPI in neonates in the first 48 to 72 hours of life. Fifty-one...
متن کامل